Search | WHO COVID-19 Research Database

An Open Natural Language Processing Development Framework for EHR-based Clinical Research: A case demonstration using the National COVID Cohort Collaborative (N3C) (preprint)

Sijia Liu; Andrew Wen; Liwei Wang; Huan He; Sunyang Fu; Robert Miller; Andrew Williams; Daniel Harris; Ramakanth Kavuluru; Mei Liu; Noor Abu-el-rub; Dalton Schutte; Rui Zhang; Masoud Rouhizadeh; John D. Osborne; Yongqun He; Umit Topaloglu; Stephanie S Hong; Joel H Saltz; Thomas Schaffter; Emily Pfaff; Christopher G. Chute; Tim Duong; Melissa A. Haendel; Rafael Fuentes; Peter Szolovits; Hua Xu; Hongfang Liu; National COVID Cohort Collaborative; Natural Language Processing; Subgroup; National COVID Cohort Collaborative.

arxiv; 2021.

Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2110.10780v3

ABSTRACT

While we pay attention to the latest advances in clinical natural language processing (NLP), we can notice some resistance in the clinical and translational research community to adopt NLP models due to limited transparency, interpretability, and usability. In this study, we proposed an open natural language processing development framework. We evaluated it through the implementation of NLP algorithms for the National COVID Cohort Collaborative (N3C). Based on the interests in information extraction from COVID-19 related clinical notes, our work includes 1) an open data annotation process using COVID-19 signs and symptoms as the use case, 2) a community-driven ruleset composing platform, and 3) a synthetic text data generation workflow to generate texts for information extraction tasks without involving human subjects. The corpora were derived from texts from three different institutions (Mayo Clinic, University of Kentucky, University of Minnesota). The gold standard annotations were tested with a single institution's (Mayo) ruleset. This resulted in performances of 0.876, 0.706, and 0.694 in F-scores for Mayo, Minnesota, and Kentucky test datasets, respectively. The study as a consortium effort of the N3C NLP subgroup demonstrates the feasibility of creating a federated NLP algorithm development and benchmarking platform to enhance multi-institution clinical NLP study and adoption. Although we use COVID-19 as a use case in this effort, our framework is general enough to be applied to other domains of interest in clinical NLP.

Subject(s)

COVID-19

Characterizing Long COVID: Deep Phenotype of a Complex Condition (preprint)

Rachel R Deer; Madeline A Rock; Nicole Vasilevsky; Leigh C Carmody; Halie M Rando; Alfred J Anzalone; Tiffany J Callahan; Carolyn T Bramante; Christopher G Chute; Casey S Greene; Joel J Gagnier; Haitao Chu; Farrukh M Koraishy; Chen Liang; Feifan Liu; Charisse R Madlock-Brown; Diego R Mazzotti; Douglas S McNair; Ann M Parker; Ben D Coleman; Hannah E Davis; Mallory A Perry; Justin T Reese; Joel H Saltz; Anthony E Solomonides; Anupam A Sule; Gary S Stein; Sebastian Kohler; Teshamae S Monteith; Vithal Madhira; Wesley D Kimble; Ramakanth Kavuluru; William B Hillegass; Lauren E Chan; James Brian Byrd; Eilis A Boudreau; Hongfang Liu; Julie A McMurry; Emily R Pfaff; Nicolas Matentzoglu; Rose Relevo; Richard A Moffitt; Robert A Schuff; Julian Solway; Heidi Spratt; Timothy Bergquist; Tellen D Bennett; Marc D Basson; Umit Topaloglu; Liwei Wang; Melissa A Haendel; Peter N Robinson.

medrxiv; 2021.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.06.23.21259416

ABSTRACT

Importance: Since late 2019, the novel coronavirus SARS-CoV-2 has given rise to a global pandemic and introduced many health challenges with economic, social, and political consequences. In addition to a complex acute presentation that can affect multiple organ systems, there is mounting evidence of various persistent long-term sequelae. The worldwide scientific community is characterizing a diverse range of seemingly common long-term outcomes associated with SARS-CoV-2 infection, but the underlying assumptions in these studies vary widely making comparisons difficult. Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 infection (PASC or long COVID), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations of long COVID. Observations: We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts of individuals three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to Human Phenotype Ontology (HPO) terms. Conclusions and Relevance: Patients and clinicians often use different terms to describe the same symptom or condition. Addressing the heterogeneous and inconsistent language used to describe the clinical manifestations of long COVID combined with the lack of standardized terminologies for long COVID will provide a necessary foundation for comparison and meta-analysis of different studies. Translating long COVID manifestations into computable HPO terms will improve the analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared or pooled more effectively. Furthermore, mapping lay terminology to HPO for long COVID manifestations will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, which may improve the stratification and thereby diagnosis and treatment of long COVID.

Subject(s)

COVID-19

A Hierarchical Bayesian Model for Stochastic Spatiotemporal SIR Modeling and Prediction of COVID-19 Cases and Hospitalizations (preprint)

Curtis B. Storlie; Ricardo L. Rojas; Gabriel O. Demuth; Benjamin D. Pollock; Patrick W. Johnson; Patrick M. Wilson; Ethan P. Heinzen; Hongfang Liu; Rickey E. Carter; Sean C. Dowdy; Shannon M. Dunlay; Elizabeth B. Habermann; Daryl J. Kor; Matthew R. Neville; Andrew H. Limper; Katherine H. Noe; Mohamad Bydon; Pablo Moreno Franco; Priya Sampathkumar; Nilay D. Shah; Henry H. Ting.

arxiv; 2021.

Preprint in English | PREPRINT-ARXIV | ID: ppzbmed-2104.04033v1

ABSTRACT

Most COVID-19 predictive modeling efforts use statistical or mathematical models to predict national- and state-level COVID-19 cases or deaths in the future. These approaches assume parameters such as reproduction time, test positivity rate, hospitalization rate, and social intervention effectiveness (masking, distancing, and mobility) are constant. However, the one certainty with the COVID-19 pandemic is that these parameters change over time, as well as vary across counties and states. In fact, the rate of spread over region, hospitalization rate, hospital length of stay and mortality rate, the proportion of the population that is susceptible, test positivity rate, and social behaviors can all change significantly over time. Thus, the quantification of uncertainty becomes critical in making meaningful and accurate forecasts of the future. Bayesian approaches are a natural way to fully represent this uncertainty in mathematical models and have become particularly popular in physics and engineering models. The explicit integration time varying parameters and uncertainty quantification into a hierarchical Bayesian forecast model differentiates the Mayo COVID-19 model from other forecasting models. By accounting for all sources of uncertainty in both parameter estimation as well as future trends with a Bayesian approach, the Mayo COVID-19 model accurately forecasts future cases and hospitalizations, as well as the degree of uncertainty. This approach has been remarkably accurate and a linchpin in Mayo Clinic's response to managing the COVID-19 pandemic. The model accurately predicted timing and extent of the summer and fall surges at Mayo Clinic sites, allowing hospital leadership to manage resources effectively to provide a successful pandemic response. This model has also proven to be very useful to the state of Minnesota to help guide difficult policy decisions.

Subject(s)

COVID-19

An Aberration Detection-Based Approach for Sentinel Syndromic Surveillance of COVID-19 and Other Novel Influenza-Like Illnesses (preprint)

Andrew Wen; Liwei Wang; Huan He; Sijia Liu; Sunyang Fu; Sunghwan Sohn; Jacob A Kugel; Vinod C Kaggal; Ming Huang; Yanshan Wang; Feichen Shen; Jungwei Fan; Hongfang Liu.

medrxiv; 2020.

Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.06.08.20124990

ABSTRACT

Coronavirus Disease 2019 has emerged as a significant global concern, triggering harsh public health restrictions in a successful bid to curb its exponential growth. As discussion shifts towards relaxation of these restrictions, there is significant concern of second-wave resurgence. The key to managing these outbreaks is early detection and intervention, and yet there is significant lag time associated with usage of laboratory confirmed cases for surveillance purposes. To address this, syndromic surveillance can be considered to provide a timelier alternative for first-line screening. Existing syndromic surveillance solutions are however typically focused around a known disease and have limited capability to distinguish between outbreaks of individual diseases sharing similar syndromes. This poses a challenge for surveillance of COVID-19 as its active periods are tend to overlap temporally with other influenza-like illnesses. In this study we explore performing sentinel syndromic surveillance for COVID-19 and other influenza-like illnesses using a deep learning-based approach. Our methods are based on aberration detection utilizing autoencoders that leverages symptom prevalence distributions to distinguish outbreaks of two ongoing diseases that share similar syndromes, even if they occur concurrently. We first demonstrate that this approach works for detection of outbreaks of influenza, which has known temporal boundaries. We then demonstrate that the autoencoder can be trained to not alert on known and well-managed influenza-like illnesses such as the common cold and influenza. Finally, we applied our approach to 2019-2020 data in the context of a COVID-19 syndromic surveillance task to demonstrate how implementation of such a system could have provided early warning of an outbreak of a novel influenza-like illness that did not match the symptom prevalence profile of influenza and other known influenza-like illnesses.

Subject(s)

COVID-19

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL